pacman::p_load(tmap, sf, dplyr, DT, sp,
stplanr, performance, mapview,
ggpubr, tidyverse, httr,
units, reshape2)Take-home Exercise 2: Analyzing the Dynamics of Bus Commute Flow and Spatial Interaction in Singapore
Setting the Scene
The inquiry focuses on the key motivators prompting city residents to rise early for their daily commutes from home to work, and the consequences of discontinuing public bus services along specific routes. These issues represent significant challenges for transport operators and urban planners.
Traditionally, understanding these dynamics involved conducting extensive commuter surveys. These surveys, however, are expensive, time-intensive, and laborious. Moreover, the data collected often requires extensive processing and analysis, leading to reports that are frequently outdated by the time they are completed.
With the digitalization of urban infrastructure, including public buses, mass rapid transit systems, public utilities, and roads, new opportunities for data collection arise. The integration of pervasive computing technologies like GPS in vehicles and SMART cards among public transport users allows for detailed tracking of movement patterns across time and space.
Despite this, the rapid accumulation of geospatial data has overwhelmed planners’ capacity to effectively analyze and convert it into valuable insights. This inefficiency negatively impacts the return on investment in data collection and management.
Motivation and Objective
The purpose of this take-home project is twofold. First, it addresses the gap in applied research demonstrating the integration, analysis, and modeling of the increasingly available open data for effective policy-making. Despite the abundance of such data, there is a noticeable absence of practical studies showcasing its potential use in policy decisions.
Second, the project aims to fill the void in practical research illustrating the application of geospatial data science and analysis (GDSA) in decision-making processes.
Therefore, the assignment involves conducting a case study to showcase the value of GDSA. This will involve synthesizing publicly accessible data from various sources to construct spatial interaction models. These models will be used to identify and analyze factors influencing the urban mobility patterns of public bus transit.
1.Getting Started
The code snippet shown is responsible for loading various packages that provide essential tools and functions for the analysis.
pacman::p_load: This function from thepacmanpackage streamlines the process of loading multiple R packages. If a package is not already installed,p_loadwill install it before loading.tmap: Utilized for creating thematic maps, essential in visualizing geospatial data.sf: Stands for “simple features” and is used for handling and analyzing geospatial data.dplyr: A part of thetidyversecollection, this package is instrumental in data manipulation tasks like filtering, selecting, and summarizing data.DT: Provides an R interface to the JavaScript library “DataTables”, enabling interactive display of data in tables.sp: Offers classes and methods for spatial data, crucial for handling spatial points, lines, and polygons.stplanr: Specifically designed for sustainable transport planning with spatial data.performance: Useful for assessing and comparing the performance of statistical models.mapview: Facilitates interactive viewing of spatial data in R.ggpubr: A part of theggplot2ecosystem, this package provides additional functions for creating publication-ready plots.tidyverse: A collection of R packages designed for data science, providing tools for data manipulation, visualization, and more.httr: Used for working with HTTP protocols to access web resources.units: Deals with measurement units, crucial for handling and converting between different units of measurement in spatial data.reshape2: Aids in reshaping data, transitioning between wide and long formats, which is often necessary in data analysis.
Each of these packages plays a specific role in the analysis, ranging from data manipulation and visualization to handling spatial and web-based data.
2.Data Importing
2.1Geospatial Data Importing
This R code snippet is focused on importing and transforming geospatial data related to Singapore’s bus stops and Metropolitan Planning Strategy Zones (MPSZ) for the year 2019. Using st_read from the sf package, it loads the BusStop and MPSZ-2019 layers from a specified directory (data/geospatial). Both datasets are then transformed to the local Singapore coordinate reference system (CRS code 3414) using st_transform.
busstop <- st_read(dsn = "data/geospatial",
layer = "BusStop") %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
mpsz <- st_read(dsn = "data/geospatial",
layer = "MPSZ-2019") %>%
st_transform(crs = 3414)Reading layer `MPSZ-2019' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
Display and grasp the basic situation of geospatial dataset mpsz.
mpszSimple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
SUBZONE_N SUBZONE_C PLN_AREA_N PLN_AREA_C REGION_N
1 MARINA EAST MESZ01 MARINA EAST ME CENTRAL REGION
2 INSTITUTION HILL RVSZ05 RIVER VALLEY RV CENTRAL REGION
3 ROBERTSON QUAY SRSZ01 SINGAPORE RIVER SR CENTRAL REGION
4 JURONG ISLAND AND BUKOM WISZ01 WESTERN ISLANDS WI WEST REGION
5 FORT CANNING MUSZ02 MUSEUM MU CENTRAL REGION
6 MARINA EAST (MP) MPSZ05 MARINE PARADE MP CENTRAL REGION
7 SUDONG WISZ03 WESTERN ISLANDS WI WEST REGION
8 SEMAKAU WISZ02 WESTERN ISLANDS WI WEST REGION
9 SOUTHERN GROUP SISZ02 SOUTHERN ISLANDS SI CENTRAL REGION
10 SENTOSA SISZ01 SOUTHERN ISLANDS SI CENTRAL REGION
REGION_C geometry
1 CR MULTIPOLYGON (((33222.98 29...
2 CR MULTIPOLYGON (((28481.45 30...
3 CR MULTIPOLYGON (((28087.34 30...
4 WR MULTIPOLYGON (((14557.7 304...
5 CR MULTIPOLYGON (((29542.53 31...
6 CR MULTIPOLYGON (((35279.55 30...
7 WR MULTIPOLYGON (((15772.59 21...
8 WR MULTIPOLYGON (((19843.41 21...
9 CR MULTIPOLYGON (((30870.53 22...
10 CR MULTIPOLYGON (((26879.04 26...
Export and save in rds format for later use.
mpsz <- write_rds(mpsz, "data/rds/mpsz.rds")2.2Aspatial Data Importing
This line of R code is used for importing an aspatial dataset named origin_destination_bus_202310.csv from a specified directory (data/aspatial). The function read_csv from the tidyverse package efficiently reads the CSV file, converting it into a dataframe. This dataset likely contains origin-destination information for bus routes in October, 2023.
odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")The glimpse(odbus) function provides a quick overview of the structure and contents of the odbus dataframe, summarizing its columns, data types, and a few initial entries.
glimpse(odbus)Rows: 5,694,297
Columns: 7
$ YEAR_MONTH <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
This code converts the ORIGIN_PT_CODE and DESTINATION_PT_CODE columns in the odbus dataframe to factors, categorizing unique bus stop codes for analysis.
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 2.3Extracting the Study Data
2.3.1Extracting the Trips Volume of Weekday Morning Peak
This code snippet filters and summarizes the odbus dataframe to extract data on bus trips during weekday morning peak hours (6 to 9 AM). It selects records marked as WEEKDAY, groups them by origin and destination bus stop codes, and then calculates the total number of trips between each stop pair in this time frame.
weekday6_9 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 6 &
TIME_PER_HOUR <= 9) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))The datatable(weekday6_9) function creates an interactive table display of the weekday6_9 dataframe, facilitating easy exploration and analysis of the data.
datatable(weekday6_9)The write_rds(weekday6_9, "data/rds/weekday6_9.rds") function saves the weekday6_9 dataframe to a file named weekday6_9.rds for future use.
write_rds(weekday6_9, "data/rds/weekday6_9.rds")The read_rds("data/rds/weekday6_9.rds") function loads the previously saved weekday6_9 dataframe back into the R environment.
weekday6_9 <- read_rds("data/rds/weekday6_9.rds")2.3.2Extracting the Trips Volume of Weekday Afternoon Peak
Repeat the process above for weekday afternoon peak (5-8 PM).
weekday17_20 <- odbus %>%
filter(DAY_TYPE == "WEEKDAY") %>%
filter(TIME_PER_HOUR >= 17 &
TIME_PER_HOUR <= 20) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))datatable(weekday17_20)write_rds(weekday17_20, "data/rds/weekday17_20.rds")weekday17_20 <- read_rds("data/rds/weekday17_20.rds")2.3.3Extracting the Trips Volume of Weekend/Holiday Morning Peak
Repeat the process above for weekend/holiday morning peak (11 AM-2 PM).
weekend11_14 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 11 &
TIME_PER_HOUR <= 14) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))datatable(weekend11_14)write_rds(weekend11_14, "data/rds/weekend11_14.rds")weekend11_14 <- read_rds("data/rds/weekend11_14.rds")2.3.4Extracting the Trips Volume of Weekend/Holiday Evening Peak
Repeat the process above for weekend/holiday evening peak (4-7 PM).
weekend16_19 <- odbus %>%
filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
filter(TIME_PER_HOUR >= 16 &
TIME_PER_HOUR <= 19) %>%
group_by(ORIGIN_PT_CODE,
DESTINATION_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))datatable(weekend16_19)write_rds(weekend16_19, "data/rds/weekend16_19.rds")weekend16_19 <- read_rds("data/rds/weekend16_19.rds")4.Geospatial Data Wrangling
4.1Combining busstop and mpsz
The code st_intersection(busstop, mpsz) combines the busstop and mpsz datasets to retain only those bus stops located within the boundaries of Singapore’s metropolitan planning zones. The select(BUS_STOP_N, SUBZONE_C) part then extracts specific columns, namely bus stop IDs and subzone codes, from the intersected dataset.
busstop_mpsz <- st_intersection(busstop, mpsz) %>%
select(BUS_STOP_N, SUBZONE_C)The mapview(busstop_mpsz) function creates an interactive map displaying the spatial data from the busstop_mpsz dataframe, visualizing the locations of bus stops within Singapore’s planning zones.
mapview(busstop_mpsz)4.2Creating Hexagon Layer
The code st_make_grid(mpsz, cellsize = 2 * 375 / sqrt(3), square = FALSE) generates a hexagonal grid overlay on the mpsz spatial data. Each hexagon in the grid has a perpendicular distance from its center to its edges of 375 meters, effectively creating a hexagonal pattern to represent Traffic Analysis Zones (TAZs).
hex_grid <- st_make_grid(mpsz, cellsize = 2 * 375 / sqrt(3), square = FALSE)This code converts the hex_grid object into a simple features (sf) dataframe hex_grid_sf, and then adds a new column hex_id that assigns a unique identifier to each hexagon in the grid.
hex_grid_sf <- st_sf(geometry = hex_grid) %>%
mutate(hex_id = 1:length(hex_grid))This code calculates the number of bus stops within each hexagon of the hex_grid_sf grid. It uses st_intersects to identify which bus stops (busstop_mpsz) fall within each hexagon, and then applies a function to count these stops for each hexagon, handling any missing values (na.rm = TRUE).
busstop_counts <- st_intersects(hex_grid_sf, busstop_mpsz, sparse = FALSE) %>%
apply(1, function(x) sum(x, na.rm = TRUE))This step assigns the computed bus stop counts (busstop_counts) to a new column bus_stop_count in the hex_grid_sf dataframe, effectively adding the number of bus stops within each hexagon to the grid data.
hex_grid_sf$bus_stop_count <- busstop_countsThis code filters the hex_grid_sf dataframe to keep only those hexagons that have one or more bus stops, effectively removing hexagons with no bus stop presence.
hex_grid_sf <- hex_grid_sf %>%
filter(bus_stop_count > 0)This code creates a visual map in R using the tmap package. It overlays hexagons from hex_grid_sf onto the mpsz spatial layout, coloring them based on the count of bus stops in each hexagon. The map features a legend, titles, and styling details for clarity and visual appeal. The final map displays the distribution of bus stops across Singapore, with additional borders for context and credits for data sources.
tmap_options(check.and.fix = TRUE)
map_hexagon <- tm_shape(hex_grid_sf) +
tm_polygons(
col = "bus_stop_count",
palette = "Purples",
style = "cont",
title = "Number of Bus Stops",
id = "hex_id",
showNA = FALSE,
alpha = 0.6,
popup.vars = c("Number of Bus Stops" = "bus_stop_count"),
popup.format = list(bus_stop_count = list(format = "f", digits = 0))
) +
tm_shape(mpsz) +
tm_borders(col = "grey75", lwd = 0.7) +
tm_layout(
main.title = "Bus Stop Distribution in Singapore",
main.title.size = 1.5,
legend.title.size = 1,
legend.text.size = 0.8,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Data Source: LTA DataMall, Data.gov.sg", position = c("RIGHT", "BOTTOM"), size = 0.8) +
tm_view(view.legend.position = c("left", "bottom"))
map_hexagon
Observations from the map:
Blank Areas: The absence of hexagons in certain parts of the map suggests these are areas with no bus stops. These could be non-residential areas, such as industrial zones, green spaces, or water bodies where public bus services are not necessary or practical.
Bus Stop-Dense Areas: Regions densely packed with hexagons indicate a high concentration of bus stops. These areas are likely to be highly urbanized with significant residential and commercial activities, necessitating a greater number of bus stops to accommodate the public transport needs of the population.
The distribution pattern reflects the urban planning and public transportation infrastructure of Singapore, designed to cater to areas with high commuter demand while excluding zones where bus stops are not viable.
4.3Correspondence of Hexagon and Bus Stop ID
The code creates a new dataset busstop_hex by intersecting busstop locations with the hex_grid_sf to assign a hex_id to each bus stop. It then selects the bus stop number and corresponding hex_id, and removes the spatial geometry data for a simple reference table. The final step omits any entries with missing values to ensure a clean dataset for merging with other data based on hexagon and bus stop IDs.
busstop_hex <- st_intersection(busstop, hex_grid_sf) %>%
select(BUS_STOP_N, hex_id) %>%
st_drop_geometry()busstop_hex <- na.omit(busstop_hex)The head(busstop_hex) function displays the first few rows of the busstop_hex dataframe for a quick preview of its structure and data.
head(busstop_hex) BUS_STOP_N hex_id
3269 25059 393
254 26379 444
2570 25751 488
4203 26389 490
2403 26369 491
2897 25761 535
Export and save busstop_hex in rds format for future use.
write_rds(busstop_hex, "data/rds/busstop_hex.rds") Remove items no longer needed from R environment to free memory and avoid redundancy.
rm(hex_grid, map_hexagon, odbus, busstop_counts)5.Preparing Commute Flow Data
5.1Weekday Morning Peak Flow
The code merges flow data from weekday6_9 with bus stop IDs from busstop_hex based on common origin bus stop numbers, then renames the key columns for clarity, assigning hex IDs to origin points and preserving destination bus stop codes.
weekday_morning_od <- left_join(weekday6_9 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_HEX = hex_id,
DESTIN_BS = DESTINATION_PT_CODE)Before continue, it is a good practice for us to check for duplicating records (It can be judged based on data frame is blank or not).
duplicate <- weekday_morning_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()If duplicated records are found, the code chunk below will be used to retain the unique records.
weekday_morning_od <- unique(weekday_morning_od)This code further enhances the weekday_morning_od dataframe by joining it with the busstop_hex dataframe to append hex_id information corresponding to the destination bus stops, linking each trip’s endpoint to its respective hexagonal spatial zone.
weekday_morning_od <- left_join(weekday_morning_od , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N"))duplicate <- weekday_morning_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekday_morning_od <- unique(weekday_morning_od)This step renames the destination hex ID column for clarity, removes any rows with missing data, then groups the data by origin and destination hex IDs, and summarizes it to calculate the total number of trips made during the weekday morning peak hours between each hexagon pair.
weekday_morning_od <- weekday_morning_od %>%
rename(DESTIN_HEX = hex_id) %>%
drop_na() %>%
group_by(ORIGIN_HEX, DESTIN_HEX) %>%
summarise(WEEKDAY_MORNING_PEAK = sum(TRIPS))It is time to save the output into an rds file format.
write_rds(weekday_morning_od, "data/rds/weekday_morning_od.rds")weekday_morning_od <- read_rds("data/rds/weekday_morning_od.rds")5.2Weekday Afternoon Peak Flow
Repeat the process above for weekday afternoon peak (5-8 PM).
weekday_afternoon_od <- left_join(weekday17_20 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_HEX = hex_id,
DESTIN_BS = DESTINATION_PT_CODE)duplicate <- weekday_afternoon_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekday_afternoon_od <- unique(weekday_afternoon_od)weekday_afternoon_od <- left_join(weekday_afternoon_od , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N"))duplicate <- weekday_afternoon_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekday_afternoon_od <- unique(weekday_afternoon_od)weekday_afternoon_od <- weekday_afternoon_od %>%
rename(DESTIN_HEX = hex_id) %>%
drop_na() %>%
group_by(ORIGIN_HEX, DESTIN_HEX) %>%
summarise(WEEKDAY_AFTERNOON_PEAK = sum(TRIPS))write_rds(weekday_afternoon_od, "data/rds/weekday_afternoon_od.rds")weekday_afternoon_od <- read_rds("data/rds/weekday_afternoon_od.rds")5.3Weekend Morning Peak Flow
Repeat the process above for weekend/holiday morning peak (11 AM-2 PM).
weekend_morning_od <- left_join(weekend11_14 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_HEX = hex_id,
DESTIN_BS = DESTINATION_PT_CODE)duplicate <- weekend_morning_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekend_morning_od <- unique(weekend_morning_od)weekend_morning_od <- left_join(weekend_morning_od , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N"))duplicate <- weekend_morning_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekend_morning_od <- unique(weekend_morning_od)weekend_morning_od <- weekend_morning_od %>%
rename(DESTIN_HEX = hex_id) %>%
drop_na() %>%
group_by(ORIGIN_HEX, DESTIN_HEX) %>%
summarise(WEEKEND_MORNING_PEAK = sum(TRIPS))write_rds(weekend_morning_od, "data/rds/weekend_morning_od.rds")weekend_morning_od <- read_rds("data/rds/weekend_morning_od.rds")5.4Weekend Evening Peak Flow
Repeat the process above for weekend/holiday evening peak (4-7 PM).
weekend_evening_od <- left_join(weekend16_19 , busstop_hex,
by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
rename(ORIGIN_BS = ORIGIN_PT_CODE,
ORIGIN_HEX = hex_id,
DESTIN_BS = DESTINATION_PT_CODE)duplicate <- weekend_evening_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekend_evening_od <- unique(weekend_evening_od)weekend_evening_od <- left_join(weekend_evening_od , busstop_hex,
by = c("DESTIN_BS" = "BUS_STOP_N"))duplicate <- weekend_evening_od %>%
group_by_all() %>%
filter(n()>1) %>%
ungroup()weekend_evening_od <- unique(weekend_evening_od)weekend_evening_od <- weekend_evening_od %>%
rename(DESTIN_HEX = hex_id) %>%
drop_na() %>%
group_by(ORIGIN_HEX, DESTIN_HEX) %>%
summarise(WEEKEND_EVENING_PEAK = sum(TRIPS))write_rds(weekend_evening_od, "data/rds/weekend_evening_od.rds")weekend_evening_od <- read_rds("data/rds/weekend_evening_od.rds")Remove items no longer needed to free memory and avoid redundancy.
rm(weekday6_9, weekday17_20, weekend11_14, weekend16_19)6.Visualising Commute Flow
6.1Removing Intra-zonal Flows
The code filters out intra-zonal flows from four separate data frames by excluding rows where the origin hexagon (ORIGIN_HEX) is the same as the destination hexagon (DESTIN_HEX). This step is crucial for analysis as it removes trips that start and end within the same traffic analysis zone, focusing the study on inter-zonal movements which are more significant for understanding commuting patterns and the broader transportation network efficiency.
weekday_morning_od1 <- weekday_morning_od[weekday_morning_od$ORIGIN_HEX!=weekday_morning_od$DESTIN_HEX,]weekday_afternoon_od1 <- weekday_afternoon_od[weekday_afternoon_od$ORIGIN_HEX!=weekday_afternoon_od$DESTIN_HEX,]weekend_morning_od1 <- weekend_morning_od[weekend_morning_od$ORIGIN_HEX!=weekend_morning_od$DESTIN_HEX,]weekend_evening_od1 <- weekend_evening_od[weekend_evening_od$ORIGIN_HEX!=weekend_evening_od$DESTIN_HEX,]6.2Creating the Desire Lines
The code uses the od2line function to transform the inter-zonal flow data from each time period into desire lines, which are spatial representations of the volume and direction of trips between different hexagons. These lines are prepared for visualization, allowing us to graphically depict and analyze commuting patterns for different times of the day and week on a map.
weekday_morning_flowLine <- od2line(flow = weekday_morning_od1,
zones = hex_grid_sf,
zone_code = "hex_id")weekday_afternoon_flowLine <- od2line(flow = weekday_afternoon_od1,
zones = hex_grid_sf,
zone_code = "hex_id")weekend_morning_flowLine <- od2line(flow = weekend_morning_od1,
zones = hex_grid_sf,
zone_code = "hex_id")weekend_evening_flowLine <- od2line(flow = weekend_evening_od1,
zones = hex_grid_sf,
zone_code = "hex_id")6.3Visualising the Desire Lines
6.3.1Weekday Morning Peak Commute Flow Map
This visualization code uses the tmap package to plot the hexagonal grid as a base, draws the boundaries of the metropolitan planning zones (MPZ), and overlays the desire lines representing high-volume weekday morning commute flows in Singapore. The lines are styled to vary in width and color intensity based on the volume of commutes, with thicker, darker lines indicating higher numbers of trips.
tm_shape(hex_grid_sf) +
tm_polygons(
border.col = "grey50",
border.alpha = 0.6,
alpha = 0.1
) +
tm_shape(weekday_morning_flowLine %>%
filter(WEEKDAY_MORNING_PEAK >= 5000)) +
tm_lines(
lwd = "WEEKDAY_MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.5,
palette = "Blues"
) +
tm_shape(mpsz) +
tm_borders(
col = "darkblue",
alpha = 0.1,
lwd = 1.5
) +
tm_layout(
main.title = "Weekday Morning Commute Flows in Singapore",
main.title.position = "center",
main.title.size = 1.0,
legend.title.size = 0.8,
legend.text.size = 0.7,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)
Insight from the Map:
The “Weekday Morning Commute Flows in Singapore” map reveals significant commuter traffic between various regions during peak morning hours. The thicker, darker lines suggest heavy flow between central business districts and outlying residential areas, indicating a typical urban commute pattern where many residents travel towards city centers for work. Areas with dense hexagon clusters, likely representing central and suburban residential zones, show extensive outward flow, highlighting these as key commuter hubs. Conversely, some regions exhibit sparse lines, suggesting lower population density or less reliance on public bus transit.
6.3.2Weekday Afternoon Peak Commute Flow Map
tm_shape(hex_grid_sf) +
tm_polygons(
border.col = "grey50",
border.alpha = 0.6,
alpha = 0.1
) +
tm_shape(weekday_afternoon_flowLine %>%
filter(WEEKDAY_AFTERNOON_PEAK >= 5000)) +
tm_lines(
lwd = "WEEKDAY_AFTERNOON_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.5,
palette = "Blues"
) +
tm_shape(mpsz) +
tm_borders(
col = "darkblue",
alpha = 0.1,
lwd = 1.5
) +
tm_layout(
main.title = "Weekday Afternoon Commute Flows in Singapore",
main.title.position = "center",
main.title.size = 1.0,
legend.title.size = 0.8,
legend.text.size = 0.7,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)
Insight from the Map:
The “Weekday Afternoon Commute Flows in Singapore” map suggests a reverse commute pattern from the morning, with significant flows from central areas to the outskirts, likely as people return home from work. The dense lines indicate high traffic volumes, particularly from the CBD and other employment hubs to residential districts. The distribution and volume of these lines can indicate areas with a high demand for evening public transportation services, and such insights could inform enhancements to bus service capacity and frequency to meet commuter needs during peak hours.
6.3.3Weekend Morning Peak Commute Flow Map
tm_shape(hex_grid_sf) +
tm_polygons(
border.col = "grey50",
border.alpha = 0.6,
alpha = 0.1
) +
tm_shape(weekend_morning_flowLine %>%
filter(WEEKEND_MORNING_PEAK >= 3000)) +
tm_lines(
lwd = "WEEKEND_MORNING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 13, 15),
n = 7,
alpha = 0.5,
palette = "Blues"
) +
tm_shape(mpsz) +
tm_borders(
col = "darkblue",
alpha = 0.1,
lwd = 1.5
) +
tm_layout(
main.title = "Weekend Morning Commute Flows in Singapore",
main.title.position = "center",
main.title.size = 1.0,
legend.title.size = 0.8,
legend.text.size = 0.7,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)
Insight from the Map:
The “Weekend Morning Commute Flows in Singapore” map shows a noticeable reduction in volume and fewer dense flow lines compared to weekdays, reflecting a typical decrease in commuting activity during weekends. The flows that are present may indicate travel to weekend-specific destinations like markets, recreational areas, or places of worship. The patterns suggest that the weekend movement is more dispersed and possibly oriented towards leisure or non-work-related activities, contrasting the concentrated, work-directed flows of weekday mornings.
6.3.4Weekend Evening Peak Commute Flow Map
tm_shape(hex_grid_sf) +
tm_polygons(
border.col = "grey50",
border.alpha = 0.6,
alpha = 0.1
) +
tm_shape(weekend_evening_flowLine %>%
filter(WEEKEND_EVENING_PEAK >= 3000)) +
tm_lines(
lwd = "WEEKEND_EVENING_PEAK",
style = "quantile",
scale = c(0.1, 1, 3, 5, 7, 10),
n = 6,
alpha = 0.5,
palette = "Blues"
) +
tm_shape(mpsz) +
tm_borders(
col = "darkblue",
alpha = 0.1,
lwd = 1.5
) +
tm_layout(
main.title = "Weekend Evening Commute Flows in Singapore",
main.title.position = "center",
main.title.size = 1.0,
legend.title.size = 0.8,
legend.text.size = 0.7,
legend.position = c("left", "bottom"),
frame = FALSE,
inner.margins = c(0.05, 0.05, 0.05, 0.05)
) +
tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)
Insight from the Map:
The “Weekend Evening Commute Flows in Singapore” map likely indicates an increase in volume compared to the morning, with more pronounced traffic flows towards residential areas and perhaps popular evening destinations. This contrasts with weekday evenings, where the flow would primarily be homeward from work centers. The patterns may suggest leisure and social activities are influencing travel, with possibly greater flows to entertainment or dining hubs and a more dispersed pattern as people return from various activities across the city.
rm(duplicate, weekday_morning_flowLine, weekday_morning_od, weekday_morning_od1,
weekend_morning_flowLine, weekend_morning_od, weekend_morning_od1,
weekend_evening_flowLine, weekend_evening_od, weekend_evening_od1)7.Attractiveness Factors
In the subsequent section, we will delve into a targeted analysis of the weekday afternoon peak period. The reason for this focus stems from the comprehensive nature of commute drivers during this time. On weekday afternoons, residents are not only heading home from work or school but often transition to leisure activities such as shopping or entertainment venues directly. This complexity presents a rich tapestry of commuting patterns worthy of in-depth examination.
7.1Entertainment Distribution Integration
Entertainment distribution is considered an attractiveness factor because these venues often draw people to travel to them, influencing commuting and traffic flows. The provided code reads in the entertn geospatial data and then calculates the count of entertainment venues within each hexagon of the hex_grid_sf grid, integrating this data to quantify the level of entertainment-related attractiveness of different areas.
entertn <- st_read(dsn = "data/geospatial",
layer = "entertn")Reading layer `entertn' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$entertn_count <- lengths(st_intersects(hex_grid_sf, entertn))7.2Food & Beverage Distribution Integration
Food and beverage (F&B) venues act as attractiveness factors because they are destinations that people may frequently visit after work, thus influencing traffic and transportation patterns within an area.
FB <- st_read(dsn = "data/geospatial",
layer = "F&B")Reading layer `F&B' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$FB_count <- lengths(st_intersects(hex_grid_sf, FB))lere <- st_read(dsn = "data/geospatial",
layer = "Liesure&Recreation")Reading layer `Liesure&Recreation' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$lere_count <- lengths(st_intersects(hex_grid_sf, lere))retail <- st_read(dsn = "data/geospatial",
layer = "Retails")Reading layer `Retails' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$retail_count <- lengths(st_intersects(hex_grid_sf, retail))trainexits <- st_read(dsn = "data/geospatial",
layer = "Train_Station_Exit_Layer")Reading layer `Train_Station_Exit_Layer' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 565 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6134.086 ymin: 27499.7 xmax: 45356.36 ymax: 47865.92
Projected CRS: SVY21
trainexits <- st_transform(trainexits, st_crs(hex_grid_sf))
hex_grid_sf$trainexits_count <- lengths(st_intersects(hex_grid_sf, trainexits))hdb <- read_csv("data/aspatial/hdb.csv")residential <- hdb %>%
filter(residential == "Y") %>%
select(lat, lng, total_dwelling_units) %>%
na.omit()residential <- st_as_sf(residential,
coords = c("lng", "lat"),
crs = 4326) %>%
st_transform(crs = st_crs(hex_grid_sf))intersections <- st_intersects(hex_grid_sf, residential, sparse = TRUE)
hex_grid_sf$residential_count <- mapply(function(index, residential) {
sum(residential$total_dwelling_units[index], na.rm = TRUE)
}, intersections, MoreArgs = list(residential = residential))rm(intersections)8.Propulsive Factors
8.1Data Integration
business <- st_read(dsn = "data/geospatial",
layer = "Business")Reading layer `Business' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 6550 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3669.148 ymin: 25408.41 xmax: 47034.83 ymax: 50148.54
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$business_count <- lengths(st_intersects(hex_grid_sf, business))url<-"https://www.onemap.gov.sg/api/common/elastic/search"
csv<-read_csv("data/aspatial/Generalinformationofschools.csv")
postcodes<-csv$`postal_code`
found<-data.frame()
not_found<-data.frame()
for(postcode in postcodes){
query<-list('searchVal'=postcode,'returnGeom'='Y','getAddrDetails'='Y','pageNum'='1')
res<- GET(url,query=query)
if((content(res)$found)!=0){
found<-rbind(found,data.frame(content(res))[4:13])
} else{
not_found = data.frame(postcode)
}
}merged = merge(csv, found, by.x = 'postal_code', by.y = 'results.POSTAL', all = TRUE)
write.csv(merged, file = "data/aspatial/schools.csv")
write.csv(not_found, file = "data/aspatial/not_found.csv")schools <- read_csv("data/aspatial/schools.csv") %>%
rename(latitude = "results.LATITUDE",
longitude = "results.LONGITUDE")%>%
select(postal_code, school_name, latitude, longitude)schools <- schools %>%
filter(!is.na(longitude) & !is.na(latitude)) %>%
st_as_sf(coords = c("longitude", "latitude"), crs = 4326) %>%
st_transform(crs = 3414)hex_grid_sf$school_count <- lengths(st_intersects(hex_grid_sf, schools))rm(csv, found, merged, not_found, query, res,
postcode, postcodes, url)finserv <- st_read(dsn = "data/geospatial",
layer = "FinServ")Reading layer `FinServ' from data source
`D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 3320 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4881.527 ymin: 25171.88 xmax: 46526.16 ymax: 49338.02
Projected CRS: SVY21 / Singapore TM
hex_grid_sf$finserv_count <- lengths(st_intersects(hex_grid_sf, finserv))8.2Final Check
datatable(hex_grid_sf)9.Computing Distance Matrix
9.1Converting from sf data.table to SpatialPolygonsDataFrame
hex_grid_sp <- as(hex_grid_sf, "Spatial")
hex_grid_spclass : SpatialPolygonsDataFrame
features : 1853
extent : 3966.576, 48566.88, 26373.72, 50123.72 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 11
names : hex_id, bus_stop_count, entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count, business_count, school_count, finserv_count
min values : 393, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0
max values : 9988, 13, 8, 71, 26, 986, 9, 3531, 60, 4, 91
9.2 Computing the distance matrix
dist <- spDists(hex_grid_sp,
longlat = FALSE)
head(dist, n=c(10, 10)) [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0.0000 2633.9134 866.0254 2291.2878 3031.0889 750.0000 1984.3135
[2,] 2633.9134 0.0000 1887.4586 433.0127 433.0127 2291.2878 866.0254
[3,] 866.0254 1887.4586 0.0000 1500.0000 2250.0000 433.0127 1145.6439
[4,] 2291.2878 433.0127 1500.0000 0.0000 750.0000 1887.4586 433.0127
[5,] 3031.0889 433.0127 2250.0000 750.0000 0.0000 2633.9134 1145.6439
[6,] 750.0000 2291.2878 433.0127 1887.4586 2633.9134 0.0000 1500.0000
[7,] 1984.3135 866.0254 1145.6439 433.0127 1145.6439 1500.0000 0.0000
[8,] 4175.8233 1561.2495 3381.9373 1887.4586 1145.6439 3750.0000 2250.0000
[9,] 1732.0508 1299.0381 866.0254 866.0254 1561.2495 1145.6439 433.0127
[10,] 2410.9127 750.0000 1561.2495 433.0127 866.0254 1887.4586 433.0127
[,8] [,9] [,10]
[1,] 4175.823 1732.0508 2410.9127
[2,] 1561.249 1299.0381 750.0000
[3,] 3381.937 866.0254 1561.2495
[4,] 1887.459 866.0254 433.0127
[5,] 1145.644 1561.2495 866.0254
[6,] 3750.000 1145.6439 1887.4586
[7,] 2250.000 433.0127 433.0127
[8,] 0.000 2633.9134 1887.4586
[9,] 2633.913 0.0000 750.0000
[10,] 1887.459 750.0000 0.0000
9.3Labeling Column and Row Headers of a Distance Matrix
hex_names <- hex_grid_sf$hex_idcolnames(dist) <- paste0(hex_names)
rownames(dist) <- paste0(hex_names)9.4Pivoting Distance Value by HEX_ID
distPair <- melt(dist) %>%
rename(dist = value)
head(distPair, 10) Var1 Var2 dist
1 393 393 0.0000
2 444 393 2633.9134
3 488 393 866.0254
4 490 393 2291.2878
5 491 393 3031.0889
6 535 393 750.0000
7 537 393 1984.3135
8 540 393 4175.8233
9 583 393 1732.0508
10 584 393 2410.9127
9.5Updating Intra-zonal Distances
distPair %>%
filter(dist > 0) %>%
summary() Var1 Var2 dist
Min. : 393 Min. : 393 Min. : 433
1st Qu.:3694 1st Qu.:3694 1st Qu.: 7949
Median :5474 Median :5474 Median :12933
Mean :5236 Mean :5236 Mean :13705
3rd Qu.:6837 3rd Qu.:6837 3rd Qu.:18407
Max. :9988 Max. :9988 Max. :44478
distPair$dist <- ifelse(distPair$dist == 0,
200, distPair$dist)distPair %>%
summary() Var1 Var2 dist
Min. : 393 Min. : 393 Min. : 200
1st Qu.:3694 1st Qu.:3694 1st Qu.: 7949
Median :5474 Median :5474 Median :12933
Mean :5236 Mean :5236 Mean :13698
3rd Qu.:6837 3rd Qu.:6837 3rd Qu.:18407
Max. :9988 Max. :9988 Max. :44478
distPair <- distPair %>%
rename(orig = Var1,
dest = Var2)write_rds(distPair, "data/rds/distPair.rds") 10.Combining Passenger Volume Data with Distance Value
weekday_afternoon_od1$ORIGIN_HEX <- as.factor(weekday_afternoon_od1$ORIGIN_HEX)
weekday_afternoon_od1$DESTIN_HEX <- as.factor(weekday_afternoon_od1$DESTIN_HEX)
distPair$orig <- as.factor(distPair$orig)
distPair$dest <- as.factor(distPair$dest)weekday_afternoon_od2 <- weekday_afternoon_od1 %>%
left_join (distPair,
by = c("ORIGIN_HEX" = "orig",
"DESTIN_HEX" = "dest"))hex_grid_df <- as.data.frame(hex_grid_sf) %>%
select(hex_id, bus_stop_count, business_count, school_count, finserv_count,
entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count) %>%
mutate(hex_id = as.character(hex_id))origin_factors <- hex_grid_df %>%
select(hex_id, bus_stop_count, business_count, school_count, finserv_count)
weekday_afternoon_od2 <- weekday_afternoon_od2 %>%
mutate(ORIGIN_HEX = as.character(ORIGIN_HEX),
DESTIN_HEX = as.character(DESTIN_HEX))weekday_afternoon_od2_with_origin <- weekday_afternoon_od2 %>%
left_join(origin_factors, by = c("ORIGIN_HEX" = "hex_id"))destin_factors <- hex_grid_df %>%
select(hex_id, entertn_count, FB_count, lere_count, retail_count, trainexits_count, residential_count)weekday_afternoon_od2_complete <- weekday_afternoon_od2_with_origin %>%
left_join(destin_factors, by = c("DESTIN_HEX" = "hex_id"))glimpse(weekday_afternoon_od2_complete)Rows: 161,671
Columns: 14
Groups: ORIGIN_HEX [1,808]
$ ORIGIN_HEX <chr> "393", "393", "393", "393", "393", "393", "393"…
$ DESTIN_HEX <chr> "535", "585", "723", "770", "779", "824", "827"…
$ WEEKDAY_AFTERNOON_PEAK <dbl> 3, 34, 182, 1, 1, 18, 6, 145, 2, 16, 250, 23, 2…
$ dist <dbl> 750.000, 3122.499, 1561.249, 1887.459, 7697.402…
$ bus_stop_count <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ business_count <int> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0,…
$ school_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ finserv_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ entertn_count <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ FB_count <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,…
$ lere_count <int> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0,…
$ retail_count <int> 0, 1, 0, 0, 0, 0, 3, 0, 2, 0, 3, 9, 2, 1, 0, 0,…
$ trainexits_count <int> 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0,…
$ residential_count <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
write_rds(weekday_afternoon_od2_complete, "data/rds/SIM_data.rds")rm(business, busstop, busstop_hex, busstop_mpsz, destin_factors, dist, distPair, entertn, FB, finserv, hdb, hex_grid_df, hex_grid_sf, hex_grid_sp, lere, mpsz, origin_factors, residential, retail, schools, trainexits, weekday_afternoon_flowLine, weekday_afternoon_od, weekday_afternoon_od1, weekday_afternoon_od2, weekday_afternoon_od2_complete, weekday_afternoon_od2_with_origin, hex_names)11.Calibrating Spatial Interaction Models
Importing the modelling data
SIM_data <- read_rds("data/rds/SIM_data.rds")Visualising the dependent variable
ggplot(data = SIM_data,
aes(x = WEEKDAY_AFTERNOON_PEAK)) +
geom_histogram()
ggplot(data = SIM_data,
aes(x = dist,
y = WEEKDAY_AFTERNOON_PEAK)) +
geom_point() +
geom_smooth(method = lm)
ggplot(data = SIM_data,
aes(x = log(dist),
y = log(WEEKDAY_AFTERNOON_PEAK))) +
geom_point() +
geom_smooth(method = lm)
Checking for Variables with Zero Values
summary(SIM_data) ORIGIN_HEX DESTIN_HEX WEEKDAY_AFTERNOON_PEAK dist
Length:161671 Length:161671 Min. : 1.0 Min. : 433
Class :character Class :character 1st Qu.: 4.0 1st Qu.: 2411
Mode :character Mode :character Median : 18.0 Median : 4684
Mean : 148.4 Mean : 5658
3rd Qu.: 73.0 3rd Qu.: 8020
Max. :59391.0 Max. :25159
bus_stop_count business_count school_count finserv_count
Min. : 1.000 Min. : 0.000 Min. :0.0000 Min. : 0.000
1st Qu.: 2.000 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.: 0.000
Median : 3.000 Median : 0.000 Median :0.0000 Median : 1.000
Mean : 3.401 Mean : 2.556 Mean :0.1807 Mean : 3.944
3rd Qu.: 4.000 3rd Qu.: 2.000 3rd Qu.:0.0000 3rd Qu.: 4.000
Max. :13.000 Max. :60.000 Max. :4.0000 Max. :91.000
entertn_count FB_count lere_count retail_count
Min. :0.0000 Min. : 0.000 Min. : 0.0000 Min. : 0.00
1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.: 0.0000 1st Qu.: 2.00
Median :0.0000 Median : 0.000 Median : 0.0000 Median : 7.00
Mean :0.1352 Mean : 2.135 Mean : 0.8934 Mean : 38.84
3rd Qu.:0.0000 3rd Qu.: 1.000 3rd Qu.: 1.0000 3rd Qu.: 31.00
Max. :8.0000 Max. :71.000 Max. :26.0000 Max. :986.00
trainexits_count residential_count
Min. :0.0000 Min. : 0.0
1st Qu.:0.0000 1st Qu.: 0.0
Median :0.0000 Median : 219.0
Mean :0.6436 Mean : 674.6
3rd Qu.:0.0000 3rd Qu.:1264.0
Max. :9.0000 Max. :3531.0
SIM_data$business_count <- ifelse(
SIM_data$business_count == 0,
0.99, SIM_data$business_count)
SIM_data$school_count <- ifelse(
SIM_data$school_count == 0,
0.99, SIM_data$school_count)
SIM_data$finserv_count <- ifelse(
SIM_data$finserv_count == 0,
0.99, SIM_data$finserv_count)
SIM_data$entertn_count <- ifelse(
SIM_data$entertn_count == 0,
0.99, SIM_data$entertn_count)
SIM_data$FB_count <- ifelse(
SIM_data$FB_count == 0,
0.99, SIM_data$FB_count)
SIM_data$lere_count <- ifelse(
SIM_data$lere_count == 0,
0.99, SIM_data$lere_count)
SIM_data$retail_count <- ifelse(
SIM_data$retail_count == 0,
0.99, SIM_data$retail_count)
SIM_data$trainexits_count <- ifelse(
SIM_data$trainexits_count == 0,
0.99, SIM_data$trainexits_count)
SIM_data$residential_count <- ifelse(
SIM_data$residential_count == 0,
0.99, SIM_data$residential_count)summary(SIM_data) ORIGIN_HEX DESTIN_HEX WEEKDAY_AFTERNOON_PEAK dist
Length:161671 Length:161671 Min. : 1.0 Min. : 433
Class :character Class :character 1st Qu.: 4.0 1st Qu.: 2411
Mode :character Mode :character Median : 18.0 Median : 4684
Mean : 148.4 Mean : 5658
3rd Qu.: 73.0 3rd Qu.: 8020
Max. :59391.0 Max. :25159
bus_stop_count business_count school_count finserv_count
Min. : 1.000 Min. : 0.990 Min. :0.990 Min. : 0.990
1st Qu.: 2.000 1st Qu.: 0.990 1st Qu.:0.990 1st Qu.: 0.990
Median : 3.000 Median : 0.990 Median :0.990 Median : 1.000
Mean : 3.401 Mean : 3.138 Mean :1.019 Mean : 4.391
3rd Qu.: 4.000 3rd Qu.: 2.000 3rd Qu.:0.990 3rd Qu.: 4.000
Max. :13.000 Max. :60.000 Max. :4.000 Max. :91.000
entertn_count FB_count lere_count retail_count
Min. :0.990 Min. : 0.990 Min. : 0.990 Min. : 0.99
1st Qu.:0.990 1st Qu.: 0.990 1st Qu.: 0.990 1st Qu.: 2.00
Median :0.990 Median : 0.990 Median : 0.990 Median : 7.00
Mean :1.045 Mean : 2.862 Mean : 1.539 Mean : 38.99
3rd Qu.:0.990 3rd Qu.: 1.000 3rd Qu.: 1.000 3rd Qu.: 31.00
Max. :8.000 Max. :71.000 Max. :26.000 Max. :986.00
trainexits_count residential_count
Min. :0.990 Min. : 0.99
1st Qu.:0.990 1st Qu.: 0.99
Median :0.990 Median : 219.00
Mean :1.399 Mean : 675.02
3rd Qu.:0.990 3rd Qu.:1264.00
Max. :9.000 Max. :3531.00
Unconstrained Spatial Interaction Model
uncSIM <- glm(formula = WEEKDAY_AFTERNOON_PEAK ~
log(bus_stop_count) +
log(business_count) +
log(school_count) +
log(finserv_count) +
log(entertn_count) +
log(FB_count) +
log(lere_count) +
log(retail_count) +
log(trainexits_count) +
log(residential_count) +
log(dist),
family = poisson(link = "log"),
data = SIM_data,
na.action = na.exclude)
uncSIM
Call: glm(formula = WEEKDAY_AFTERNOON_PEAK ~ log(bus_stop_count) +
log(business_count) + log(school_count) + log(finserv_count) +
log(entertn_count) + log(FB_count) + log(lere_count) + log(retail_count) +
log(trainexits_count) + log(residential_count) + log(dist),
family = poisson(link = "log"), data = SIM_data, na.action = na.exclude)
Coefficients:
(Intercept) log(bus_stop_count) log(business_count)
11.67689 0.40019 -0.06794
log(school_count) log(finserv_count) log(entertn_count)
-0.31202 0.42704 0.05788
log(FB_count) log(lere_count) log(retail_count)
-0.17995 -0.10079 0.05820
log(trainexits_count) log(residential_count) log(dist)
0.69347 0.12209 -1.04916
Degrees of Freedom: 161670 Total (i.e. Null); 161659 Residual
Null Deviance: 96290000
Residual Deviance: 59350000 AIC: 60130000
R-squared Function
CalcRSquared <- function(observed,estimated){
r <- cor(observed,estimated)
R2 <- r^2
R2
}CalcRSquared(uncSIM$data$WEEKDAY_AFTERNOON_PEAK, uncSIM$fitted.values)[1] 0.1063576
r2_mcfadden(uncSIM)# R2 for Generalized Linear Regression
R2: 0.381
adj. R2: 0.381